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ABSTRACT 

This report contains four papers describing research 
based on the view of mathematical knowledge as a hierarchy of 
"rules. " The first paper; "The Role of Rules in Behavior" was 
abstracted in ED 040 036 (October 1370) . The second paper: "A Theory 
of Mathematical Knowledge" defends the thesis that rules are the 
basic building blocks of mathematical knowledge. These rules operate 
at different levels, for example; addition and subraction at one 
level, the idea of inverse operations at a higher level. Mathematical 
creativity then consists of combining rules to produce new results. 
The third paper; "Deterministic Theorizing in Structural Learning" 
describes experiments based on this hypothesis, in which a subject’s 
performance on certain tasks is predicted with virtual certainty from 
his performance on the component rules. The role of the memory is 
also discussed, and experiments are described supporting the view of 
the mind as an information processor with a fixed capacity' n he 
fourth paper: "A Research Basis for Teacher Education" a’-yw 'hat 

teachers need to know more of the aims of the mathematic;- 7 teach, 

more of the way mathematical knowledge is structured, and more of the 
way students learn. The author’s research on structural learning (as 
described in the second and third papers) is then summarized, and 
possible developments are outlined. (MM) 
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SUMMARY OF 



MATHEMATICS AND STRUCTURAL LEARNING 



INTRODUCTION AND OBJECTIVES 

This research was concerned with two basic problems, 

1 # The first objective can be stated as a question, 1f How can mathe*- 
matical knowledge be characterized in a way which is at once behav- 
iorally significant and compatible with what is known about mathe- 
matical structures?' 1 We were interested in pursuing this, research 
at two levels . 

(a) We wanted to further clarify both the nature of rules and 
their role in behavior • 

(b) We also wanted to explore the possibility of extending out- 
rule formulation to provide a basis for characterizing more 
complex mathematical knowledge* 

2« The second major objective involved the development of a psycho- 
logical theory — <of operational definitions and theoretical assump- 
tions which were compatible with our rule based characterizations. 

More specifically 9 we wanted to consider the following problems. 

(a) How can one operationally define what rule an individual 
Is using in terms of the. behavior aiici s? He had already 
P'lofciU'iiC -i preliminary version of such a definition and Levine 
had done this for the special case of discrimination learning 
but many details still needed to be worked out. 

(b) The learner often has available several ways of accomplishing 
a particular task and why he uses the pror?edure he does use is 
not at all clear. Building on seme experimental research we 

had already completed, we wanted to come rip **ith some fundamental 
proposals that would be suitable for detailed experimental testing 

(c) We hoped to explore the fundamental question of how existing 
knowledge is combined to make new behavioir tpossible . 

To the extent that time and funds allowed , ’We also planned to 
comduct some pilot work to test our theoretical Ideas . 



RESEARCH OUTCOMES AND IMPLICATIONS 



While we did not initially expect to fully achieve our aims 
during the course of this short project, things progressed more 
rapidly than we had dared hope. 

Part 1 . 

An entire paper, "Role of Rules in Behavior: Toward an Oper- 
ational Definition of What (Rule) is learned, rr now published in 
the Psychological Review , is devoted to the first problem. In this 
paper, a precise formulation of the notion of a rule in terms of 
sets and functions - — the Set Function Language (SFL) — is proposed. 
In particular, the extension of a rule is viewed as a function, or 
set. of S-R pairs . The rule itself involves a domain, an operation, 
and a range. It is argued that this molar formulation cannot be 
captured by networks of associations unless one allows associations 
to act on (other ) associations. This formulation is then used as 
a basis for showing how rules are involved in decoding and encoding, 
symbol and icon reference, and higher order relationships. De- 
coding and encoding are shown to involve insertion into and ex- 
traction from classes, respectively. Reference is viewed in terms 
of rules which map equivalence classes of signs into the classes 
of entities denoted by these signs • Symbols are shown to involve 
arbitrary reference, whereas icons retain properties in common 
with the entities they denote. Higher order relationships are 
then expressed as nigher order rules on rules. This is a direct 
generalization of associations on association^ 

Furthermore, a partial solution is posed to objective (2a) 
tht vexing problem of rl what (rule) is learned. 11 Given a rule— governed 
class of behaviors, "what is. learned” is defined as the class of 
rules which provides an accurate account of test data. Empirical 
evidence is presented for a simple performance hypothesis based on 
this definition. 

There are three major directions in which future research might 
proceed. First, the rule formulation (SFL) itself undoubtedly can 
be further improved. While we feel reasonably confident that the 
basic ideas presented in this paper would hold up under further anal- 
ysis, additional detail must be added but only as much as is abso- 

lutely necessary to deal with behaviorally relevant aspects of the 
rule construct. ' 

Second, the SFL might profitably be used as an analytical tool 
to help clarify what is involved in many kinds of structured learning 
and performance. Most of the SFL-based research conducted to date 
has concentrated on an analysis of what is being presented, the 
nature of the required outputs , what is being learned , and the inter- 
relationships between them. While such analyses can, at least to 
some extent, be undertaken without the use of the SFL, or for that 
matter any other scientific language, the SFL seems to provide a 
useful framework for putting things into perspective and for helping 
to clarify difficult points. In the author’s research a number of 




questions have been asked on mathematics learning which seem not 
to have been asked previously in any serious way. For example, we 
have found that what is learned in mathematical discovery can sometimes 
be identified and presented by exposition with equivalent results. 
Similarly, we were led, on the basis of an earlier finding, to the 
question of what in the statement of a mathematical rule leads to 
extrascope transfer. 

The SFL needs to be applied more systematically in studies in- 
volving subject matters other than mathematics and, in particular, 
we need to determine where the SFL might profitably be used to formu- 
late research and where not. There is reason to believe that tha SFL 
may be applicable only to the extent that the classes of overt stim- 
uli and responses involved can be viewed as discrete (i.e., nonover- 
lapping) and exhaustive entities. While these requirements are met 
throughout much of mathematics and other structured knowledge, this 
may not be the case in such areas as social studies, poetry, and 
even language, where synonymy does not necessarily imply equivalence. 

It is hoped that other investigators will apply the SFL to a wider 
range of tasks and thereby help to clarify further its relative 
strengths and weaknesses. 

Third, theoretical assumptions need to be made and their impli- 
cations need to be drawn out. Although this paper is concerned pri- 
marily with describing a new scientific language, it was not possible 
to completely avoid reference to theoretical assumptions • Thus , the 
proposed operational definition of "what is learned 11 would be behav- 
iorally meaningless without an application assumption. Fortunately, 
there is considerable empirical support for the idea. While such 
an assumption is clearly not sufficient for a theory of structural 
learning, it might nonetheless come to play a central role. Whatever 
form additional theoretical assumptions might take, it seems almost 
certain that they would be more compatible with cognitive (rule-based) 
notions than with those based on neo-associationism. Nonetheless, 
any complete theory of structural learning will undoubtedly require 
reference to such things as the limited capacity of human subjects 
to process information. Without recourse to some such physiological 
capacity, I can see no way in which to explain memory or other aspects 
of information processing. 

Fart 2 . 

An effective answer to objective (lb) is provided by a second 
paper, entitled f, A Theory of Mathematical Knowledge: Can Rules Account 
for Creative Behavior? 11 

In this paper, we proposed and defended the rather strong thesis 
that rules are the basic building blocks of all mathematical knowledge 

The main purpose of the paper was to indicate how complex mathe- 
matical behavior might be accounted for in terms of finite rule sets. 

Every mathematical system consists of one or more basic sets 
of elements, together with one or more operations and/or relations 
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and/or distinguished elements of the basic sets » By capitalizing 
on certain logical equivalences it is possible to reduce the char- 
acterizing elements to one basic set and one or more relations. 
Consider a simple example — the system whose basic set consists 
of three "undefined" elements A,B,C, denoted £a,B,Cj, with A being 
distinguished in the sense that it serves as an "identity," and 
whose defining relation isO = £(A,A)"^A, (A,B)-^B, (B,A)—^B, (A,C) 
-*C, (C.A)r-^C, (B,B)-9-C, (C ,C) —*? B, (B,C)-^A, (C,B)-*A3>. 

What may .be called an embodiment of a mathematical system results 
on assignment of meaning to the undefined elements. Thus, in the. 
example just cited, the undefined terms might correspond to certain 
rotations with A corresponding to a rotation of 0 ; B, to a rotation 
of 120°; and C to a rotation of 240°. In this case, the operation 
would simply be "followed by." For example, a rotation of 120 
followed by one of 240° results in the same action as a rotation of 

0 °. 



What kinds of behavior are implied by knowing systems and em- 
bodiments of this sort? And, how can such behaviors be accounted 
for in terms if rules? 

pirst, knowing a system certainly implies the ability to com- 
pute within the system. Thus, for example, given the pair, A,B, 
the "knower" should be able to give the "sum," B. ^He should also 
be able to do more complex computations , like ( (AO B) O A.) O C -9 
(B0 A)0C“»B0C-^A, which involve combining individual facts 
(i.e. , associations). In addition, the knower should be able to^ 
give "differences," i.e., given the si.® and one of the "addends," 
he should be able to generate the other addend. 

If these were the only kinds of behavior to be accounted for 
one could simply list the facts (rules) involved. But clearly any 
reasonable interpretation of "knowing a system" must also deal with 
relationships as well. For example, mastery of a system would 
surely include the ability to generate the subtraction (difference) 
rule from the addition rule, and vice versa. Knowing that B + C = A, 
for example, should: be a sufficient basis for generating the corre- 
sponding subtraction fact, A - B = C. 

Relational rules of this sort provide a simple way to account 
for such behaviors. Thus, instead of listing all of the subtraction 
facts separately it would be sufficient to know the addition facts 
together with the relational rule. That is, assuming, as is tradi- 
tional. in formal linguistics, that individual rules can be composed 
performed in succession. 

The obvious way to account for such relationships the way 
taken by curriculum developers of the operational objectives per- 
suasion “ .is to simply add more rules to the characterisation. 

There are, however, major problems with this approach* For one 
thing, listing a new rule for each kind of relationship would have 



a post hoc flavor not likely to add much in the way of understanding 
more creative behavior. For each new system (of the same type) con- 
sidered, for example, there would be a new relational rule for each 
one in the original system. Even granting the economy obtained by 
eliminating inverses, and the like, the number of rules could grow, 
large very fast. This would not be bad xn itself assuming that this 
is the best one could do. The important question, however, is: Can 
one come up with a more efficient account which is at the same time 
more powerful — — and which allows for some measure of creative be- 
havior? 

To answer this question, first note that knowing how one or 
more systems are related to a given one may provide a basis for 
knowing how to compute in the new systems given how to compute in 
the original. The relationships of interest will generally be 
mathematical in nature, but chey need not be limited to morphisms . 

For examnle, one system may be a simple generalization of another, 
as with cyclic 5 and cyclic 3 groups. 

Because of the way particular relationships are defined, how- 
ever, this advantage will generally be of a limited sort. With 
homomorphisms , for example, the ability to compute in tne new system 
applies only to the defining operations themselves and not, say, to 
their inverses or to relationships between' the operations • 

A far more powerful and parsimonious characterization results 
by simply allowing rules to operate, not on just ordinary stimuli, 
but on other rulds. Such rules may be said to be acting in a higher 
order capacity — or, in short, to be higher order rules . Although 
functions on functions are common in various branches of analysis, 
and their formalization is routine, the idea seems not to have per- 
vaded formal linguistics. The closest linguists have come in this 
regard has been to introduce the notion of a grammatical trans- 
formation between phrase markers, which closely parallels what are 
here called relational rules (e.g., between addition and subtraction) 

Consider what higher order rules might suggest in the present 
situation. Suppose that a subject has learned a higher order rule 
which connects each operator (rule) with its inverse. Such a rule 
would connect not only, say, addition of numbers with subtraction, 
but composition of all sorts (e.g., of permutations rotations, 
rigid motions, etc.) with the corresponding inverse operations. 

The defining operation of each system and its inverse may be 
thought of as being distinct rules which are mapped one on to the 
other by this higher order "inverse" rule. Assume, in addition, 
that the subject has learned how to add in system A, the relation- 
ship (e.g., a homomorphism) between system A and system B, and 
also how to form the composition of arbitrary rules (in the rule 
set) . 

In this case, there are all sorts of behaviors that the 
(idealized) subject would be capable of;. For example, he would be 
able to subtract, not only in system A but in system B as well. To^ 



see this, one need only observe that the subject can form the com- 
position of the rule between systems A and B and the higher order' 
inverse rule. This composite (higher order) rule in turn allows 
the subject first to generate an addition rule in system B and then 
to generate a subtraction rule in system B. This subtraction rule, 
in turn, would allow the subject to subtract. Translated into 
more meaningful terms, a rule set of this sort would imply such 
abilities as finding inverses with rigid motions given only the 
ability to add numbers. But, then, isn't this just what is con- 
sidered as creative behavior? 

To summarize, this paper deals with what it means to know an 
existing body of mathematics. Relatively little is said about 
intellectual skills of the sort that must inevitably be involved 
in doing real mathematics. Nonetheless, it is shown that what 
appears to be creative behavior might well be accounted for in 
terms of growing rule sets. The key idea in making this a. feasible 
and rather attractive possibility is that of the higher order rule. 

Part 3. 

The third paper, "Deterministic Theorizing in Structural 
Learning: Three Levels of Empiricism," deals in an integrated 
fashion with objectives (2a), (2b), and (2c). It also reports on 
some pilot experiments designed to test the basic theoretical 
hypotheses. 

The foundations of three partial theories of structural 
learning are described and some relevant pilot data are reported. 

First, a partial theory of structured knowledge is proposed, in 
which it is argued that the knowledge had by any given subject 
may be characterized in terms of a finite set of rules. By allowing 
rules to operate on other rules (in the set) it is shown how new 
rules can be generated. Examples are also given to show how 
these new rules , in turn , can account for creative behavior . With 
the addition of several performance assumptions, this theory is ex- 
tended so as to account for learning , performance , and motivation 
under idealized conditions where behavior is unencumbered by memory . 
Finally , we outline how memory and information processing might be 
dealt with, and report some preliminary data in favor of our main 
hypothesis • ; : v •*.. 

The theory itself represents a sharp departure from existing 
theories of cognitive behavior , although it does have some things 
in common with existent competence and information-processing theories . 
The differences even here, however, are not minor , but have a fun- 
damental effect , both on theoretical adequacy and on the very, kinds 
of empirical questions one asks. Probably the most basic departure 
is the idea of introducing different levels of empiricism, arid the 
possibility of deterministic theorizing at each of these levels. 
According to this view, it is possible to do behavior ally relevant 
empirical research at at least three quite distinct levels .Although 
all competence models , such as those proposed by Chomsky in linguistics 
purport to deal with knowledge , concern traditionally has been 



limited primarily to the so-called mature speaker or hearer who 
effectively knows all there is to know about the language. In the 
present formulation, it is just as reasonable to talk about the 
knowledge had by different individuals, naive ones as well as 
mature. This is an extremely Important characteristic in dealing 
with subject matters like mathematics, science, or even language, 
where knowledge is not a static thing, but grows with experience. 

An even more basic departure is allowing rules to act on 
other rules. This seems to us to be the only real hope we have 
at present with which to account for creative behavior within an 
algorithmic framework. There is a good deal more detailed work 
to be done, but the main roadblocks appear to be ones of detail 
and not of principle. 

The distinction between idealized theorizing and related 
empiricism, on the one hand, and the more complete theory , in- 
cluding memory, on the other, is equally basic •• By ignoring the 
effects of memory and information processing capacity, for ex- 
ample, it has been possible to deal with quite complex behavior, 
such as problem solving and motivation, in a very precise way 
and even more important, in near deterministic fashion. 

In the memory-free theory, the main task is one of intro- 
ducing mechanisms of idealized performance, learning, and moti- 
vation, thereby extending the theory of knowledge so that it deals 
explicitly with the way in which available knowledge is put to use. 
This more encompassing theory is still a partial theory, however, 
one which applies only where subjects are unencumbered by either 
memory or their intrinsically limited capacity to process inf or- 
mation. Xt should be emphasized, however, that it is a theory 
which is assumed to apply no matter what knowledge an idealized 
subject has available. Thus, even though the knowledge had by 
different individuals may vary greatly, the same theory of idea- 
lized behavior is assumed to hold over all individuals . 

The basic assumption on which this theory rests is that 
people are goal-seeking information processors. 

The theory deals with three basic kinds of situation: One 
type of situation is where the subject knows one or more rules 
which apply in the given goal situation. The second is where 
the subject does not explicitly know a rule which applies in 
the goal situation. The third is actually a refinement of the 
first, and deals with the question of why, when a -subject has 
more than one rule available , he selects the rule that he does . ^ 
Why not one of the others? These problems, are closely allied with 
what have traditionally been called performance, learning, and 
motivation, respectively. 

The first case is simplest to deal with. We need only 
assume that; 

(A) Given a goal situation for which a subject has at least 
one rule available, the subject will. apply one of the rules. 



Thus, for example, if a subject's goal is to find the sum 
of two numbers, and he knows how to add, then he will actually 
use an addition rule. 

As simple as it appears, this assumption has a number of 
important implications. One is that it provides an adequate basis 
for determining what might be called a subject's behavior potential, 
relative to a given class of rulegoverned behaviors • Briefly by 
applying this assumption to assessing individual behavior potential 
or individualized testing, we have been able to predict a subject's 
second test performance on individual items with a hxgh degree of 
accuracy. (Precisely how this was done is described in detail in 
the paper.) In a total of 204 cases, utilizing a variety of tasks 
and subjects of greatly differing abilities and grade levels (from 
the preschool through graduate school), we have been^ able to 
predict second test performance 197 times or with 97% accuracy. 

The results of this research could be particularly useful for 
constructing refined diagnostic tests in many areas of psycholog- 
ical and educational testing. 

In the second case, the subject has not explicitly learned 
a rule for achieving a given goal. He has a problem in the classical 
sense — a problem situation, a goal, and a barrier between them. 

The major theoretical problem is to explain what happens 
when a subject is confronted with such a situation. 

As a first approximation at least, it again appears that 
a very simple mechanism may suffice. This mechanism may be framed 
as a hypothesis as follows: 

(B) Given a goal situation for which the subject does not 
have a learned rule immediately available, control temporarily 
shifts to the higher order goal of deriving a procedure which 
does satisfy the original goal condition. 

With the higher-order goal in force, the subject presumably 
selects from among the available and relevant higher order rules 
in the same way as he would with any other goal. Furthermore, 
where no such higher order rules are available , we assume that 
control reverts to still higher order goals. Theoretically, this 
process could continue indefinitely. 

To complete things , a third hypothesis is needed which allows, 
control to revert back to the original goal once the higher order 
goal has been satisfied. We can state this as follows. 

(C) If the higher order goal has been satisfied, control re- 
verts back to the original goal. 

These assumptions provide an adequate basis for generating 
predictions in a wide variety of problem solving situations . Sup- 
pose, for example, that the problem posed to a subject is to con- . 
vert a given number of yards Into inches. Here, we assume that 



the subject has mastered one rule for converting yards into feet, 
and another for converting feet into inches. The subject is also 
assumed to have mastered a higher order rule which allows him to 
combine learned rules (in which the output of one matches the in- 
put of the other, as is the case, for example, with rules for con- 
verting yards into feet and feet into inches ) into single com- 
posite rules. 

In a situation of this sort, the subject does not have an 
applicable rule which is immediately available, and, hence, ac- 
cording to hypothesis (B) , he automatically adopts the higher 
order goal of deriving such a procedure. Then, according to the 
simple performance hypothesis (A), the subject selects the higher 
order composition rule and applies it to the rules for converting 
yards into feet and feet into inches. This yields a new composite 
rule for converting yards into inches . Next , control reverts to 
the original goal by hypothesis (C) and, finally, the subject 
applies newly derived composite rule by hypothesis (A) to 

generate trie desired response. 

Preliminary empirical support for these hypotheses is re- 
ported in, the paper. 

The important point of all this is that learning can be 
viewed as a problem-solving process. Subjects learn as a result 
of being exposed to problem situations wbich require tnat tbey 
combine available rules in new ways . Once a problem has been 
solved, however, no further learning is assumed to take place 
upon repeated presentations of similar problems . In that case, 
the subject simply applies the newly learned rule. 

On the basis of these assumptions, it would be possible to 
derive all kinds of implications about learning and performance. 

In particular, highly specific predictions might be made about 
Individuals who enter the learning situation with given sets of _ 
rules and who are then subjected to particular sequences of problem 
situations. Such analyses would have obvious implications for . 
instructional theory. 

The third case is concerned with what happens where a subject 
has more than one way of achieving a given goal and we want to 
know which way he will choose. It was assumed in this case that 
the subject would use one of the available rules (Hypothesis (A)), 
but nothing was said about which one. It is our contention that 
the answer to this question of "which one*' lies at the base of 
what we normally think of as motivation, especially as it is 
realized in structural learning and performance. 

We worked on this problem for sometime, and at first we were 
not particularly pleased with our results. To be sure, our pilot 
data almost always supported our hypotheses in a gross probabilistic 
sense, but they could hardly be called deterministic. By using past 
selections as a guide, we have been able to qo much better and 



have recently been able to determine what rules or parts of rules 
a subject selected with an accuracy rate of about 85%. 

To recapitulate , it should be re— emphasized that everything 
which has been said so far about learning, performance, and moti- 
vation only applies in situations where memory and the limited 
capacity of human subjects to process information do not enter. 

The proposed mechanisms have all assumed an information processor 
with an essentially unlimited ability to process information, and 
with perfect memory for previously acquired knowledge. 

This definitely not imply that the theorizing is of 

little value. That concision would be wrong on at least two 
counts. First, there asre many practical situations im structural 
learning where memory tis cir minimal concern. In problem solving, 
for example , the subject i-s alimsst always given all c~£ the paper, 
pencils , and other menmxy ^ids that he needs . Typically , we also 
do our best to insure aat the necessary lower-order rules are 
readily available, evem to~ the extent of making textbooks available. 
The concern is generally 1th whether or not the individual can 
integrate available knowl^tge to isolve problems. Considerations 
such as whether he can do it in ills head or not, time to solution, 
and so on, are of secondary coirciern. Second, questions of memory 
can usually be eliminated In experimentation by insuring that 
relevant rules and memo Try aids are available to the s ub 3 ^ c t * This 
can normally be accomplished by training. 

The mechanisms of memory and information processing proposed 
in the paper are speculative and subject to revision. Nonetheless, 
they are simpler and potentially more precise than those of existing 
information processing theories. Furthermore, the theory is de- 
signed primarily to apply to memory and information processing 
with complex structured materials - and not just with the short-term 
memory of lists of nonsense syllables , simple words, or sentences, 
as has been the case with most, modern memory research . 

Finally, we mention some of the most promising areas of ap- 
plication of this work in education. Insofar as curriculum con- 
struction is concerned, it is sufficient, to simply reemphasize 
that it is a small conceptual step from characterizing knowledge 
of individual subjects in terms of rules to characterizing cur- 
ricula in terms of operational objectives. Unlike the current- 
list type of curricula, however, explicit attention might be given 
to the identification of higher order relationships. As simple 
as this change may seem, its importance cannot be overemphasized. 

It makes it possible not only to build a good deal of transfer 
potential directly into a curriculum, but also to capture, we 
think, what subject matter specialists almost uniformly feel 
has been missing in current curricula of the operational objectives 
variety — the creative element. We have a pilot project under- 
way at Penn at this time, in which we are attempting to apply 
these ideas to teaching mathematics to elementary school teachers. 
It is too soon to say* how things will actually tura out , but so # 
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far things have been going extremely well and we hope that we 
will be able to teach more sophisticated mathematics in this 
way, and to teach it more effectively. 

A second major implication has to do with testing, particu- 
larly that sort of testing used to determine mastery on the ob- 
jectives which go to make up curricula, of. the sort indicated.. 

Here, the groundwork has been all but completed, and application 
would seem to be a rather straightf orwarr 1 operation. In fact, 
we are actually utilizing these ideas in another small-scale 
developmental project aimed at diagnosing difficulties urban 
youngsters are having with the basic ari' imetical skills. Another 
phase of this project has to do with retaliation of these dif- 
ficulties. In" this regard, we are using c ur own home-grown, 
version of hierarchy construction. What wa do, in effect, is 
simply to identify the particular algorithm (rule) we want to 
teach the child, and break it down into atomic sub-rules. Each 
sub-rule, in turn, is broken down in the same way, until we 
reach a level where we can be sure that all of our subjects have 
all of the necessary competencies. This breakdown corresponds 
directly to the hierarchies obtained in the usual manner by 
asking Gagne's often quoted question, "What must the learner 
be able to do in order to do such-and-such?" Unlike the tra- 
ditional approach, however, ours provides a natural basis for 
constructing alternative hierarchies (since any- number of pro- 
cedures may be used to generate the same class of behaviors). 
Possibilities also exist in such areas as teaching problem 
solving, but our work to date has been limited to testing basic 
hypotheses. 

Part 4. 

A final paper, "A Research Basis for Teacher Education,’.' goes 
beyond the immediate scope of the proposed research: It is directed 
• to professional educators and attempts to provide a broader per- 
spective concerning the problems and their possible resolutions . 
More specifically, the purpose of this paper is to (1) indicate 
why basic research in mathematics (and subject matter) education 
is badly needed (2) to identify some of the kinds of information 
which every good mathematics teacher needs (3) to describe some 
of the basic research which we have under way and also to mention 
some of the implications of this research for further development 
in mathematics education and behavioral research generally, and 
(4) to describe some of our current developmental activities in 
teacher education in mathematics . 



ROLE OF RULES IN BEHAVIOR: 

toward an operational definition of 

WHAT (RULE) IS LEARNED 1 

JOSEPH M. SCANDURA 
University of Pennsylvania 

A precise formulation of the notion ofia rule in terms of sets and functions 
is proposed. It is argued that this molar formulation cannot be capture^ oy 
networks of associations unless one allows associations to act on (other; 
associations. This formulation is then used as a basis for showing how 
rules are involved in decoding ana encoding, symbol and icon reference, and 
higher order relationships. Decoding" and encoding are shown to involve 
insertion into and extraction from classes, respectively. Reference is 
viewed in terms of rules which map equivalence classes of signs into the 
classes of entities denoted by these signs. Symbols are shown to involve 
arbitrary reference, whereas icons retain properties in- common with the 
entiti-s they denote. Higher order relationships are then expressed as 
higher order rules on rules. This is a direct generalization of associations 
on associations. Finally, a partial solution is posed to the vexing problem 
of “what (rule) is learned.” Given a rule-governed class^ of behaviors, 
“what is learned” is defined as the class of rules which provides an accurate 
account of test data. Empirical evidenced presented for a simple per- 
formance hypothesis based on this definition. 



During the past few years there has been 
a gradual shift of emphasis in psychology 
from the study of simple to complex learn- 
ing. Even— where- investigators- .are^still - 

working primarily with simple tasks, such 
. -as the learning of. paired-associate lists, the 
questions being asked seem to have broader 
significance. 

This shift has not come, however, without 
attendant difficulties. While existing theo- 
ries are clearly inadequate for dealing with 
complex structural learning, there are other, 
even more basic, problems which have not 
yet been adequately resolved. In partipilar , 
there has been no scientific language with 
which even to talk about many of the prob- 
lems. The general question of the relative 
efficacy of discovery and expository learning 

„ Portions of this article were presented at the 
meeting of the American Psychological Associa- 
tion, Washington, D. C, September 1967. . The 
author would like to thank John H. Dumin for 
his general assistance in the preparation of this 
article. 

An unabridged version of the present paper can 
be obtained on request from the author. 



(e.g., Gagne & Brown, 1961; Wittrock, 
1963) provides a ready example. The re- 
search has not only been confounded by dif- 
ferences .in. terminology, but also by the fre- 
quent use of multiple dependent measures 
and vagueness as to what is being taught 
and discovered (Ro.ughead & Scandura, 
1968). Similar statements may be made 
about arguments for and against specific 
versus general training (e.g., see Scandura, 
Woodward, & Lee, 1967). 

In trying to add precision to their formu- 
lations, most investigators to date have taken 
one of two paths. Sotne have chosen to 
elaborate on or to extend the S-R media- 
tional language (e g., Berlyne, 1965; Staats 
& Staats, 1963) . Others have shamelessly 
preferred more cognitive, or rule-based, for- 
mulations (Bartlett, 1932, 1958; Mandler, 

’ 1962, 1965 ; 'Miller, Galanter/ & Pribram, 
1960). • . 

Which approach is to be preferred is 
perhaps based more on a philosophy oi sci- 
ence than on psychology per se. The farmer 
approach appeals more to those who want 

their- -- theories — and basic -iormulations 

grounded in empirical data. They have a 
precise language now, which relates spe- 
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cifically to behavior, and do not want to 
give it up without good reason. Presuma- 
bly, they would rather improve it as to 
detail than to discard the whole idea. Cog- 
nitive formulations generally conform nuDre 
closely to intuition about psycho! 'igiical 
processes, but they too have major dis- 
advantages. On the one hand, more tradi- 
tional cognitive theories (e.g., Bartlett, 
1958; FI a veil, 1963; Tolman, 1949) have 
been extremely vague as to their relation- 
ships to behavior. Precise languages have 
been almost nonexistent. Modern informa- 
tion processing theories (e.g., Hunt, 1962; 
Newell, Shaw, & Simon, 1958; Reitman, 
1965), on the other hand, which use the 
computer as a mode!, have been formulated 
in precise terms (computer programs). The 
problem here is that it is not at all clear 
how specific aspects of programs relate to 
human behavior — if indeed they do at all. 
Most of what has gone into such programs 
exists as much for programming convenience 
as for modeling human behavior, and it is 
anyone’s guess what are the really important 
ingredients. In order for a language to be 
maximally useful, it must be pruned of 
excess and possibly misleading notational 
baggage. 3 

Over the past several years, a precise 
formulation of the notion of a rule has 
evolved. Since this formulation involves 
sets and functions, and since these character- 
izing notions have been used by the author 
and some of his students in formulating re- 
search, the label Set-Function Language 
(SFL) has been used. The SFL retains 
many basic tenets of cognitive formulations, 
but like all scientific languages, is free of 
specific theoretical assumptions. In addi- 
tion, the SFL is based on extremely basic, 
and highly general, notions (sets and func- 
tions), so that it deals only with essential 
aspects of the constructs and empirical phe- 
nomena involved. 

3 In this regard, Shaw (1970) has recently pre- 
sented cogent arguments to the effect that under- 
standing computer programs, which model human 
behavior, is likely to be just as difficult as under- 
standing the human behavior itself. Computer 
simulation, in effect, is not an adequate substitute 
for theory construction in psychology. 



The purp* ^ of this paper is to describe 
this formul? !t on (of v: rule) and to show 
how ifc prof iles for a. number of features 
involved ir leamimg of complex struc- 
tured knov-f^dlge : decoding and encoding 
processes, (sg-.fji) reference, and higher order 
relationships. Finally., with the addition of 
an exfr<emel} wealk theoretical assumption 
about how sr&jects (5s) perform, a partial 
solution to important problem of '‘what 
(rule) is learned” is proposed. 

The Set-jtw notion Language (SFL) 
Two Preliminary Observations 

During tShe summer of 1962, Greeno and 
Scandura (1^66) found in a verbal con- 
cept learning sntuatiom that transfer oc- 
curred on the first presentation of a new 
item or not mt alL Specifically, they had 
their 5 s learn common responses (non- 
sense syllable) to each stimulus exemplar 
(nouns) of *rrying concepts. After each 
S-R pair hau been learned, a transfer list 
was presented containing one new instance 
of each concept from the first list together 
with a paired control. The 5s either gave 
the correct responses to new concept exem- 
plars on the first learning trial, or they 
learned the items at the same rate as their 
controls. The data were consistent with the 
hypotheses of all-or-none transfer. 

It later occurred to Scandura that 5 s 
might also transfer on an all-or-none basis 
to new instances of rules in which the 
stimuli may be paired with different re- 
sponses. In this case, one nsLW instance of 
a rule could be used as a test to determine 
whether the rule is learned, thereby making 
it possible to predict the responses to other 
(new) stimuli associated with the rule. 

To test this point, a number of pilot studied 
were conducted during 1963 (Scandura, 
1966, 1967a, 1969a) ; in one experiment 
(Scandura,, 1969a),* a total of 15 (highly 
educated) 5s overleamed the list shown in 
Figure 1. Prior to learning the li*t, both the 
5s and the experimenter agreed on lie rele- 
vant dimensions and values — size (large- 
small), color (black-white), and shape (cir- 
cle-triangle). The 5s were told to learn the 
pairs as efficiently as they could, since this 
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Sample learning, assessment (Test One), and prediction 
(Test Two) stimuli and responses. 



might make it possible for them to respond The second observation v/as that each of 
appropriately to the transfer stimuli. After Gagne's (1965) eight types of learning could 
learning, the Test 1 stimuli were presented be represented by a set of ordered stimulus- 
and the Ss were instructed to respond on the response pairs (Scandura, 1966, 1967a, 

basis of what they had just learned. Posi- 1968) in which each stimulus was paired 
tive reinforcement was given no matter what with a unique response. That is, each type 
the response. Then, the Test 2 stimuli were conformed precisely to the set-theoretic deft- 
presented in the same manner. The results rution of the mathematical notion of a func - 
were clear-cut; All but three of these Ss {ion. To see this, first recall Gagne's eight 
gave the responses “black" and “large,” types of learning: (1) signal learning— the 
respectively, to the Test 1 stimuli (see Fig- establishment of a conditioned response, 
ure 1) and also responded with “white” and which is general, diffuse, and emotional, and 
“small” to the Test 2 stimuli. not under voluntary control, to some signal ; 

On what basis could this happen? It was (2) S-R learning — making very precise 
surely not a simple case of stirnultns general!- movements, under voluntary control to very 
zation; the responses did not depend solely specific stimuli; (3) chaining — connecting 
on common stimulus properties. The first together in a sequence two (or more) pre- 
Test 1 stimulus, for example, is as much like viously learned S-R pairs; (4) verbal asso- 
the fourth learning stimulus as the first, ciation — a subvariety of chaining in which 
Perhaps the simplest interpretation of the verbal stimuli and responses are involved; 
obtained results is that most of the 5*s dis- (5) multiple discrimination — learning a set 
covered the two underlying principles dur- of distinct chains which are free of filter- 
ing List 1 learning and later applied them to ference ; (6) concept learning— learning to 
the test stimuli. These principles might be respond to stimuli in terms of abstracted 
stated, “If (the stimulus is a) triangle, then properties like color, shape, and number ; (7) 
(the response is the name of the) color” and principle (rule) learning 4 — acquiring the 
“if circle, then size.” In effect, whenever idea involved in such propositions as “If A, 
an S responded to the first test stimulus in then B” where A and B are concepts— that 
accordance with one of these principles, he is, a chain or relationship between concepts, 
almost invariably responded in the same way internal representations (of concepts) rather 
to the second. Since this study was con- than observables being linked; (8) problem 
ducted, a relatively large amount of relevant solving — combining old principles so as to 
data has been collected with essentially the form new ones. . 

same results (Roughead & Scandura, 1968; The first four types clear, y involve a 
Scandura, 1967b, 1969b ; Scandura & Dur- 4 Gagne has not made a distinction between rules 
nin, 1968; Scandura et al., 1967). and principles. 
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single stimulus and a single response. 
(Chaining and verbal associations, of course, 
may involve intermediary steps.) Multiple 
discrimination simply refers to a set of dis- 
crete S-R pairings (possibly with inter- 
mediate steps), each of which may act 
independently of the others and, hence, must 
be represented as a separate entity. Know- 
ing a concept, however, may involve any 
number of different stimuli (exemplars), 
and each of these stimuli is paired with a 
common (unique) response. In addition, 
rules involve multiple responses. The stim- 
uli and responses, however, are not paired 
in an arbitrary way; each stimulus has a 
unique response attached to it (see Figure 
1, for an example). 

In effect a rule can be denoted by a func- 
tion whose domain is a set of stimuli and 
whose range is a set of responses. The con- 
cept and the association become special cases. 
A concept can be represented by a function 
in which each stimulus is paired with a 
common response, while an association can 
be .viewed as a function whose defining set 
consists of a single S-R pair. 

What Gagne (1965) called problem solv- 
ing involves a higher level of analysis. In 
particular, “combining old principles so as to 
form new ones” requires (higher order) 
rules which act on other rules. More gen- 
erally, higher order rules may involve any 
number of combinations (sets) of old rules 
and any number of new ones, paired so that 
there is a unique new rule attached to each 
set of old ones. (Details are deferred to the 
section on higher order rules.) 

Was this only a more formal way of ex- 
pressing what psychologists have said all 
along— that responses are “functionally” de- 
pendent on stimuli? I' could not help but 
feel that there was a deeper significance. 
Still, defining rules, concepts, and associa- 
tions in terms of their denotative sets, left 
me with the unsatisfactory feeling of not 
knowing what they really were ; or, to put 
it differently, how to characterize the knowl- 
edge underlying the observables. 

A Characterization of She Rule Construct 

A function can be defined as a set of 
ordered pairs or as an ordered triple. The 



denotation of a rule, ( i.e., class of S-R be- 
haviors which can be generated by a rule) 
seems best characterized by the former type 
of definition, but the rule construct itself 
conforms more closely to the latter type of 
definition involving a set of inputs, a set 
of outputs, and a connecting operation. 

Consider, for example, the task of sum- 
ming arithmetic series (e.g., 1+3 + 5 + 7 
+ 9). In this case, any one of an equiva- 
lence class fi of overt stimuli (like the sign, 
“ 1+3 + 5 + 7 + 9”) may represent the 
same number series (i.e., 1 + 3 + 5 + 7 
+ 9) . Each such equivalence class serves 
as an effective (functionally distinct) stimu- 
lus. Effective responses (sums) may simi- 
larly be thought of as equivalence classes of 
overt responses (e.g., “25”). The denota- 
tion of the rule, then, consists of the set of 
ordered pairs whose first elements are 
equivalence classes of representations of 
number. series, and whose second elements 
are equivalence classes of representations of 
their respective sums. 

Underlying rules are, however, probably 
more naturally thought of not as acting on 
effective stimuli (responses) themselves but 
on properties of the entities denoted by these 
effective stimuli. Thus, for example, the 
property of having “a common difference of 
two between adjacent terms” refers to the 
number series, 1+3 + 5, and not to its 
name, “1 + 3 + 5.” Note that a distinction 
is being made between the entity (e.g,, num- 
ber series) ' and the equivalence class of 

5 By an equivalence class of overt stimuli (re- 
sponses) or an effective stimulus is meant a class 
of overt stimuli, each of which has the same set 
of defining properties. The term “effective^ is 
used to emphasize that we are talking about the 
stimuli and responses “effectively” operating in 
the situation rather than the overt stimuli and 
responses themselves. Thus, for example, the 
stimuli “5” and “five" would, for most purposes, 
count as the same effective stimulus since they both 
represent the same number. The stimuli “5” and 
“6,” on the other hand, would correspond to dif- 
ferent effective stimuli. In previous papers, Scan- 
dura (1966, 1967a) used the term “functionally 
distinct.” 

The distinction between an entity and the sign 
used to represent it will also play a role dn the 
present analysis. Thli distinction is first referred 
to in the following paragraphs and is explained 
more fully in the section on reference. 
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representations of that entity. However, 
since there is a one-to-one relation between 
equivalence classes of overt stimuli (the 
signs) and the abstract entities denoted, we 
can ignore the distinction, except in the sec- 
tion on reference, where it plays a central 
role. These properties, in turn, determine 
(via the rule) other properties (of the re- 
sponses). One rule for summing arithmetic 
series, for example, may be represented by 
the expression, [(A + L)/2]N, where A 
refers to the first term, L 10 the last term, 
and N to the number of terms of the arith- 
metic series in question. The critical inputs 
associated with this rule are triples of values 
of the dimensions. A , L , and N (e.g., A = 
1, L = 7, TV = 4). These triples may be 
viewed as (composite) properties of the 
entities denoted by the stimuli. We may 
refer to these critical properties as response 
determining (D) properties. The set of out- 
puts consists of response properties (num- 
bers) derived from the properties in D. 
These properties (numbers) determine 
equivalence classes of number names (e.g., 
the number property, 16, which is the sum 
of the series, 1 4 3 4-5 + 7, defines the 
equivalence class of all signs of the form 
“16”). (Notice, however, that these num- 
ber properties may also be viewed as prop- 
erties of the series themselves. In this role, 
the number properties are called sums,- 
which just happen to be properties of arith- 
metic series which can be derived from other 
presumably more easily determined proper- 
ties, like the first term and the number of 
terms.) 

In effect, a rule may be defined as an 
ordered triple (D, O, R) where D refers 
to the determining properties of the stimuli, 
and O to the combining operation or trans- 
formation by which the derived properties 
(of the responses, R) are derived from the 
properties in D. 

Parenthetically, note that accounting for 
such behaviors as adding arithmetic series 
in terms of rules is not the same as intro- 
ducing mediating responses and response- 
produced stimuli. In the latter case, the 
basic idea is to provide a detailed account of 



the interrelationships involved in terms of 
(possibly complex) networks of associations. 
Rules treat such relationships at a more 
molar level. That is, rules by their very 
nature act on classes of effective stimuli and 
not on particular stimuli. 

The basic question, pf course, is which of 
these two alternatives better captures the 
essential characteristics of behavior on struc- 
tured tasks. The first observation cited 
above, taken together with the relatively 
large amount of available data (e.g., Scan- 
dura, 1969a), indicates the behavioral re- 
ality of rules. Scandura found repeatedly 
that performance on any one instance of 
most structured tasks is directly related to 
performance on any other instance of the 
respective tasks. Behavior strongly tends 
to be either uniformly good or bad. (There 
is more that can be said on this point, but 
going into this here would detract from the 
main point.) Accordingly, it would seem 
that when an investigator is interested in 
working with structured tasks, the rule 
would seem to provide the more natural 
conceptual basis. Mediational accounts of 
such behavior tend to be ad hoc as well as 
complex and cumbersome. (In working 
with nonsense materials, on the other hand, 
where it is unclear as to what, if any, rela- 
tionships exist among the instances, some 
resort to associations and their related the- 
ory may be more fruitful.) 

This inadequacy of mediational accounts 
becomes one of principle unless one takes 
a more general view of stimulus and re- 
sponse than has generally been the case. In 
particular, no mediation theorist to the au- 
thor's knowledge has explicitly considered 
as stimuli what amount, in a related context, 
to S-R pairs (i.e., associations). (Note : 
Any given entity may seve as either a stimu- 
lus or a response. What the entity is called 
in any particular situation depends solely 
on the role it is playing — Hocutt, 1967.) 
To see this, it is sufficient to consider the 
associative connections involved in gener- 
ating sums and differences in arithmetic,-, 
together with those connections which reJate? 
addition and subtraction. In this case, we 
would have as a minimum such connections 
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as 

4 + 5-*9 

i 

9 — 5 — > 4 

where the vertical arrow acts neither on the 
stimuli, 4 + 5 and 9 — 5, nor on the re- 
sponses, 9 and 4, but rather on the associa- 
tions themselves. 

As a second and somewhat more subtle 
example, consider the task of adding 4 and 
“3” in column addition. If embedded in a 
41 

problem like +32, the tens digit in the sum 
is “7.” However, if the problem involves 
47 

carrying, like +35, then the tens digit in the 
sum is “8.” In effect, the response given 
to the complex “4, 3” depends on the con- 
text, in particular on the previous response. 
(In the first problem, the units digits “1” 
and “2” sum to “3” which does not involve 
carrying, whereas, in the second problem, 
the sum “12” of “7" and “5” does.) This 
implies that the effective stimulus in column 
addition includes not just the digivs in a 
particular column but the previous response 
as well, specifically “carry” or “no carry ” 
In effect, the stimulus in this case is a pair 
consisting of either “carry” or “no carry” 
paired with the tens digits “4” and “3.” 
Thus, “carry, 4, 3” elicits the response “8,” 
whereas “no carry, 4, 3” elicits “7.” To see 
how these S-R pairs may be viewed as asso- 
ciations on associations, we need, only ob- 
serve that mediation theorists have no diffi- 
culty in talking about stimulus properties of 
responses (or, equivalently, in saying that 
the source of a given stimulus is the previous 
response). Hence, in this case, the stimu- 
lus properties of the response “carry,” for 
example, may be thought of as eliciting the 
, compound entity “4” and 3 as the re- 
sponse; it is the association “carry”— »“4, 3,” 
then, that serves as the stimulus (in the 
second problem) for the response “8.” 

As unfamiliar as this view may seem, this 
is precisely the sort of assumption that 
Suppes (1969) had to make in proving that 
given any finite connected automaton (which 
for present purposes amounts essentially to 



a rule), there is a stimulus -response model 
chat asymptotically becomes isomorphic to it. 
In order to account for rule-governed be- 
havior, then, mediation theorists of neces- 
sity will have to generalize what to date has 
been the traditional view. The section that 
follows on higher order rules represents an 
important generalization of this idea. In 
particular, the view is taken here that “asso- 
ciations on associations” are nothing more 
than a special case of “rules on rules,” such 
as those commonly involved in problem solv- 
ing. 

Decoding and Encoding Processes 

The distinction we have made between 
overt stimuli and responses, on the one 
hand, and properties (of the entities denoted 
by these stimuli), on the other, raises the 
question of how the decoding and encod- 
ing “gaps”, are to be filled. In particular, 
rules operate on properties of stimuli and 
not directly on overt stimuli (or, more accu- 
rately, on properties of the entities these 
stimuli denote). Similarly, they generate 
properties (of responses), but not the re- 
sponses themselves. The rule, iV 2 , for ex- 
ample, operates on the “number of terms” 
(a property of number series) and (with 
certain number series) generates a number 
(a property of sets) called the sum. The 
question essentially is one of how to repre- 
sent the process by which stimulus proper- 
ties are determined from overt stimuli and 
how overt responses are determined from 
derived (response) properties. 

’Fortunately, this can be accomplished 
quite naturally. Each stimulus property de- 
fines a class of overt stimuli (i.e., the class 
consisting of those overt stimuli which de- 
note entities having that property). Hence, 
decoding may be viewed as a process or 
mapping which assigns overt stimuli to par- 
ticular classes. The result of decoding an 
overt stimulus, then, can be viewed as a 
class of overt stimuli. For example, one 
decoding process involved in “perceiving” 
representations of arithmetic series is the 
map which assigns given (representations 
of) series to classes in a way that leave s> all 
of the “essential” properties invariant ( in- 
cluding, but not limited to, the first, last, 
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and number of terms). For example, "1* 4 
3 4 5 4 7” and “one plus three plus five 
plus seven” would be assigned to a commoji 
class, since they both represent precisely the 
same arithmetic series. Similarly, the stim- 
uli 



and (24 4 16) ^ 17, 



would almost certainly be viewed by edu- 
cated adults as equivalent to 



and (24 4 16)/17, 



^complex. 6 It is important to note, however, 
that the difficulties involved are of a practi- 
cal nature and are not of principle. In prin- 
ciple, it is always possible to increase the 
depth of analysis further by introducing ad- 
ditional rules at the beginning of the initially 
given rules (for decoding) or at the end 
(for encoding). An initial rule, for exam- 
ple, may be used to derive a property used 
in a given rule from still more primitive 
properties. Thus, for example, the prop- 
erty, AT, the number of terms in an arith- 
metic number series, which is used in the 
rule 







respectively, but not to 



may be derived from the more primitive 
properties, A, L, and D (the common dif- 
ference) by means of the (initial) rule 



and (38 + 17)-=r. 

lo 



(^) 



4 1. 



A similar mechanism is required on the 
response side for encoding. Once the de- 
rived response properties have been deter- 
mined, the question remains as to how the 
result is to be made observable. Consider a 
situation in which an S, after having deter- 
mined the solution to a problem, is expected 
to .write it down on paper. For simplicity, 
let the solution be the number five (a prop- 
erty of sets) and let the desired response be 
the numeral “5.” Clearly, there are many 
variations in the 'way this numeral could be 
written which would have no effect whatso- 
ever on the referent. Each of the allowed 
variations in sign refers to the number five. 
The encoding process simply amounts to 
constructing or identifying one of these signs. 
In effect, since each derived property in R 
defines a class ; of observables ( i.e., overt 
responses) , it would appear that the encod- 
ing process might be thought of as "select- 
ing” one of the functionally equivalent overt 
responses in the defined class. 

Normally the processes involved in per- 
ception (decoding) and encoding are very 



The notion of a composite rule provides 
a ready means for representing multistage 
rules of this sort. Thus, if the rules, r lf r 2 , 

• • ■, iy represent n simple rules, such that 
the outputs of Vi may serve as inputs of r {+ i 
(fs= 1, 2, • * n — 1), then the rule 

g = • • • ror x represents the composite 

rule. Complex procedures (e.g., see Groen, 
1967; Supp.es & Groen, 1967), which in- 
volve branching, can be handled in a similar 
fashion, but discussion here would be an 
unwarranted digression . (for details, see 
Scandura, in press). 

e It is worth noting that this complexity is 
intrinsic: and is not unique to the present formula- 
tion. Thus, in S-R mediation language, decoding 
corresponds to S (overt) — Tm and encoding, to 
s m — R (overt). In effect, ' both formulations make 
a distinction between overt and effective stimuli, 
on the one hand, and overt and effective responses 
(i.e., sm’s which elicit overt responses), on the 
other. The difference ts simply in how the indicated 
“gaps” are to be filled. Mediation theorists prefer 
to use associations both for connections between 
the observable world and internal events and be- 
tween internal events. In the present formulation, 
each kind of connection is treated differently. The 
former involve “inserting observables into classes” 
or “extracting entities from them.” Internal events 
are connected by rules. 
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Reference 

Although I avoided going into details 
above, the nature of the discussion forced 
a recognition of the distinction between 
equivalence classes of signs, on the one hand, 
and the entities denoted by these equivalence 
classes, on the other. This distinction came 
up both in discussing the rule construct itself 
and in discussing the decoding process. In 
the latter regard, we saw that there are two 
distinct senses in which (meaningful) stim- 
uli may be viewed, (a) Signs may be inter- 
preted in terms of what they represent. 
Thus, signs may be held equivalent if they 
have the same meaning. This view was 
emphasized, as it seems most appropriate in 
dealing with meaningful behavior. (In fact, 
one might possibly define “meaningful” 
stimuli to be stimuli which have clear refer- 
ents.) ( b ) Signs, however, may also be 
thought of as (meaningless) entities in their 
own right (with properties of their own). 
In this case, signs are held equivalent ac- 
cording to whether or not they have certain 
properties in common. Even signs like “X 
P Z” and “* o + M which have no well- 
defined referents, for example, might be 
taken as equivalent, since each has three 
perceptually distinct parts. 

The problem of reference, then, in the 
present view, is one of explicating the rela- 
tionship between signs and their referents. 
As can readily be appreciated- this general 
question is extremely complex. All we can 
do here is to touch on two important aspects 
of the problem. Specifically, nothing is 
said about signs with ambiguous meanings. 

First, if the meaning of signs is defined in 
terms of denoted entities, how are we to 
know when an S; has acquired particular 
meanings ? There seem to be at least two 
ways in . which this might be done : (a) by 
determining whether or not the subject can 
paraphrase or otherwise describe the in- 
tended meaning, and (&) by seeing whether 
or not he can perform in accordance with 
the underlying meaning. The referent of 
(equivalence classes of signs like) “snake,” 
for example, is defined as the class of (all) 
snakes. An S might demonstrate his aware- 
ness of the intended meaning, then, by de- 



scribing what a snake is — “a hideous, long, 
thin, squirming animal, with no legs, which 
moves by . . . and whose bite is sometimes 
poisonous. . . .” He might also do this by 
reacting appropriately to a statement (sign 
complex) in which “snake” is embedded. 
Thus, if someone shouts “Snake!” during a 
hike in the outback, the listener is likely to 
evidence through his behavior an awareness 
of imminent danger. He knows the mean- 
ing! The meaning of the relational symbol 
“run,” which refers to the class of all acts 
of running, might be determined in generally 
the same way. Apparently, this approach is 
in some ways similar to Osgood's (1953) 
S-R formulation, in which responses are 
viewed essentially as indicators that signs 
have certain referents. The present view is 
potentially more precise, however, in that 
with signs having highly structured mean- 
ings, the indicators of meaning can be made 
highly specific and unambiguous. Consider, 
for example, the rule statement “[(-4-t* 
L)/2]N .” In this case one can test for 
the meaning (a rule) by presenting particu- 
lar arithmetic number series and seeing if 
the can apply the rule so as to give the 
indicated sum (see below). (For more de- 
tails, also see Scandura, in press,) 

The second question is perhaps more cen- 
tral to the present discussion and deals spe- 
cifically with the nature of the connection 
between equivalence classes of signs and 
their meanings. Specifically, is this connec- 
tion rulelike— or would associative connec- 
tions be adequate in all cases? A positive 
answer to this question would lend consid- 
erable additional support for adopting the 
rule as the basic unit of behavioral analysis. 
A negative answer would be a serious blow 
to any such conception. 

To provide an answer, first note that the 
connection between signs and their referents 
can be represented as rules which map prop- 
erties of signs into (other) properties. 
These latter properties, in turn, define 
classes of entities called referents. Thus, 
for example, “snake” or any other equiva- 
lent sign has certain properties which dis- 
tinguish it from other signs. These invari- 
ant properties are precisely those which are 
mapped onto the properties which character- 
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ize (real) snakes (i.e., the latter properties 
are what define the class of snakes). The 
class of symbols equivalent to “run” is as- 
signed to its meaning in precisely the same 
way. 

Of course, we could also represent this 
type of connection directly in terms of asso- 
ciations. The real question, therefore, is 
whether or not connections exist which re- 
quire for their characterization wcmdegener- 
ate rules. (Presumably, representation of 
such rules ill terms of associations in the 
manner described by Suppes (1969) would 
be cumbersome and, in addition, would re- 
quire a generalization of the notion of asso- 
ciation — to include associations on associa- 
tions.) 

As it turns out, there are two funda- 
mentally different kinds of reference in 
which nondegenerate rules are involved. 
One type involves signs that are abstract 
symbols, and the other, icons. 

Before taking a look at symbol reference 
generally, first consider what might be called 
elemental symbols, symbols which are mini- 
mal indicators of meaning. (In the language 
of automata theory and formal systems, such 
symbols are called “letters of the alphabet.”) 
Probably the single most important charac- 
teristic of elemental symbols is that they 
denote arbitrarily. The arbitrary nature of 
symbol reference has both limitations and 
advantages. Perhaps its most important 
limitation is that symbol reference is non- 
generalizable. Thus, for example, there is. 
no common way in which the numerals “5” 
and “6” refer. The meaning of each symbol 
must be learned separately; knowing that 
“5” denotes the number of elements in 
{00000} does not help in learning that “6” 
denotes the number of elements in {000000}. 
Any other symbol would be an equally valid 
candidate. 

On the other hand, because symbols may 
be assigned arbitrary meanings, they can 
be used to represent highly abstract notions 
in a precise way. Thus, “five apples” refers 
to the class of all sets of five apples, whereas 
“five” refers to the class of all sets of five 
elements; but there is no loss of precision 
associated with the increasing degree of ab- 
straction. For example, the symbol, “N” 



(the set of natural numbers), refers un- 
ambiguously to a still higher order collec- 
tion. Abstract relations may be denoted by 
symbols with equal ease. Thus, the terms 
“taller than,” “greater than,” and “relation- 
ship between” refer to progressively more 
abstract relations with equal precision. 

Obviously, not all reference is of this 
simple form. If it were, 5s could learn the 
meaning of, at most, a finite number of dif- 
ferent symbols and this clearly runs counter 
to what is known about language. In par- 
ticular, there is no upper bound on the 
number of new statements in English (say) 
which can be understood by a # mature 
knower of the language. What is needed, 

. therefore, is some mechanism which is suffi- 
ciently rich to provide for this sort of capa- 
bility. 

Rules would satisfy this requirement, of 
course, but it remains to be shown exactly 
how they might be involved. To make the 
discussion definite, consider the task of 
“generating” the meaning of arbitrary nu- 
merals like “35,” “278,” and so on. Clearly, 
composite numerals of this set have mean- 
ings, just as do simple numerals, like “5” 
and “6.” But individuals do not have to 
learn each meaning independently. They 
presumably have rules available for figuring 
out the meanings of even new numerals 
which they have never seen before. 

It is possible to construct a rule for inter- 
preting numerals of arbitrary size, but we 
can make essentially the same point, and 
more simply, by considering numerals with 
no more than two digits. In this case, the 
following rule will work: “Give meaning to 
the units-digit (i.e., the first digit on the 
right) ; then give meaning to the tens-digit ; 
next, “multiply” the meaning of the tens-digit 
by 10; finally, combine the meaning of the 
units-digit with the meaning of the trans- 
formed tens-digit.” In order to interpret this 
rule properly, note the following : (a) Know- 
ing the meanings of the digits 0 through 
9 is basic to using the rule. ( b ) “Multiply 
by 10” may be interpreted to mean “Replace 
each element in each set in the denotation of 
the tens digit with 10 elements of the same 
kind.” For example, consider the numeral, 
“35.” In this case, we first give meaning to 
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“5,” as above. The same is then done for 
“3.” In carrying out the next step, we take 
into account sets in the second meaning class. 
Thus, corresponding to the set, 

nil). 

we construct the set, 

mm> mm. 

where each of the three bundles contains 
precisely 10 vertical lines. For details on 
how such interpretative rules are constructed, 
the reader is referred to Scandura (in press). 

In general, then, it would appear that 
compound symbols may acquire meaning by 
referral to the meanings of the constituent 
symbols, together with a “meaning gram- 
mar” by which such meanings are combined 
to form rules for interpretation. General 
support for this contention was found in a 
recent study by Scandura (1967b). It was 
shown that where the “grammar” necessary 
for combining the meanings of constituent 
(minimal) symbols has been mastered, 
knowing the meaning of particular constitu- 
ent symbols is both a necessary and also 
(essentially) a. sufficient condition for apply- 
ing a rule statement involving these par- 
ticular symbols. In this case, the gram- 
mar” involved the use of parentheses (i»c., 
“work from the inside out”). The originally 
naive were trained with neutral materials 
[e.g., 3 (5 + 4 (3 + 2))] until they could 
reliably work with parentheses. Then, half 
of the 5s were trained on the meaning of 
unfamiliar signs, like [X] , “the largest 
integer in . X.” Training continued until 
they could reliably give the “meaning” of 
arbitrary signs of the form [X] (e.g., [6.6], 

[7.0], [8.9], etc.). These Ss could almost 
invariably apply rules, like [( [X] + [Y])/ 
[Z]], to instances once statements of these 
rules had been committed to memory. The 
Ss who were not given this training on 

meaning were uniformly unable to apply the 
rule. Presumably, the ability to work with 
parentheses can be viewed as a highly en- 
compassing rule of grammar, one. which 
makes it possible to integrate the meanings 
of a wide variety of kinds of symbols. Once 
the meaning of the constituent symbols in 



a rule statement (involving parentheses) is 
made clear and is available to the S (in 
memory), the “grammar” combines these 
meanings into a unified whole. The state- 
ment, “name the color,” provides a similar 
example. “Name” is a verb phrase which 
refers to a large number of acts of naming. 
“Color” simply indicates what is to be 
named. Intuitive semantics tells us how 
these meanings are to be combined. A task 
for the future will be to make such intuitions 
public. 

In contrast to symbols, icons 7 have prop- 
erties in common with the entities they de- 
note; they denote in a nonarbitrary way. 
This characteristic way in which icons de- 
note has important implications. In the 
first place, some relations seem easier to 
denote using icons than others. Thus, prox- 
imity and relative size can be handled quite 
easily, but, as an example, the; relationship 
between parents and children can only be 
dealt with indirectly. Insofar as mathe- 
matics is concerned, icons seem to be- par- 
ticularly well suited to representing geo- 
metric ideas where the relationships involved 
tend to vary continuously. 

Second, and this is most important here, 
icon reference involves '(nondegenerate) 

rules. The icons, “1,” “H” "111," ”1111,” 

etc., for example, can all be mapped onto 
their meanings by a common rule. This is 
possible just because each icon can be put 
into one-to-one correspondence with the ele- 
.ments of the sets in the corresponding de- 
notative class of sets. (That is, each set in 
the given denotative class contains the cor- 
responding number of elements.) For a 
second example, it is sufficient to note that 
particular properties of relief maps corre- 
spond to features of the terrain they repre- 
sent. These corresponding features provide 
a sufficient basis for constructing general 
rules for interpretation. f 

This ability of icons to refer in a gener- 
alizable way, however, is bought at a price. 
Because they , are referentlike, icons retain 

t Here, “icon” is used to refer to any still or 
moving picturelike representation. While still pic- 
tures may refer to “things” and certain kinds of 
“relations,” moving pictures are required to. repre- 
sent action. • * 
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progressively more irrelevant information 
when used to represent increasingly abstract 
ideas. Thus, it is easy to find an icon that 
can be used to represent a particular finite 
arithmetic sequence of numbers in which 
the successive numbers increase by a com- 
mon amount. The sequence 1, 3, 5, 7, for 
example, can be represented by the icon, 




However, without the introdutflfaiu of sym- 
bols of one sort or another, Ssusps are not 
capable of representing arithnrairr sequences 
in general. In .this case, the i£0n would 
have to indicate that there isr 3a common 
difference between successive fesrni^ and that 
both the relative size of the Hrsfc term and 
the (common) difference between terms and 
the number of terms are irrelevant. Ab- 
stracting from the icon above, we observe 
that 




would provide an adequate representation if 
it did not specify a relative size between the 
first jump and the successive jumps as well 
as a specific number of terms (a.e., 4). This 
information is irrelevant and, worse, mis- 
leading. 8 



Higher Order Rules 

It has already been commented that rules 
can be represented in terms of associative 
networks, but only if we allow associations 
to act on other associations (viewed as 
stimuli) (cf. Suppes, 1969). Since associa- 
tions in the present view are nothing more 
than special cases of rules, it seems reason- 
able to also ask whether there is any natural 
rule counterpart to associations on associa- 
tions. In particular, if rules are as basic to 
complex learning as has been suggested, 
then one would suspect that there ought 
to be (nondegenerate) rules which act on 
classes of associations (rather than on single 
associations), or, even better, rufcs which 
act on classes of rules. 

Notice that this observation provides us 
vnth another independent check of Uhe power 
of the formulation. We "have just seen how 
spies are involved in reference, and now 
we ask whether they are also imrolved in 
ihiglier order relationships, which arcs analo- 
gous to associations on associations. 

To prove the point, we need only 1 demon- 
strate the existence of one such higher order 
rule. As a simple exaunple, consider the 
rules involved in translating from one unit 
of measurement into another : yards into 
feet, gallons into quarts, quarts into pints, 
weeks into days, and so on. Clearly, there 
are close relationships among many such 
rules which obviate the need to learn all of 
them separately. Knowing how to convert 
yards into feet and how to convert feet into 
inches, for example, is often a sufficient 
basis for converting yards into inches. Fur- 
thermore, for most adults, it makes no dif- 

8 It should also be apparent that signs evident in 
the “real world” are like icons, only more so. 
Rather than being two dimensional, however, these 
signs have three dimensions. Because of this, the 
signs and their referents must have even more 
things in common. The rules defining reference, 
therefore, are even more ‘general than with icons. 
Still, it should be emphasized that “real world” 
signs need not refer to identity. To the contrary, 
such signs almost invariably refer to broad classes. 
Thus, young ' children let blocks refer to auto- 
mobiles, buildings, boxes, and so on. Even u J°h$ 
Smith,” at a given instant in time, does not refev 
to identity— but, typically, to John Smith irrespec- 
tive of when. 
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ference what the particular units are- If 
told that there are five “apps** in a “blug” 
and two "blugs” in a "mugg/* it would be 
a simple task to also convert “muggs** into 
“apps** (i.e., first multiply by two and then 
by five). ‘ 

The point is that many people appear able 
to combine pairs of given rules into corre- 
sponding composite rules. Thus, for exam- 
ple, given rules like, u x yards— >3# feet/' 
and ct y feet — * 12y inches/* many 5s can 
combine them to form composite rules, like 
“x yards — >3.r feet-^ 12(3;r) inches/’ (Us- 
ing arrows is a convenient way to represent 
the denotation of rules. Thus, foir example, 
x yards — > 3x feet is interpreted, to mean 
{(jr yards, 3.r fee£)|;r is a number}.) 

One can account for this type of ability 
by introducing a higher order rule, which 
says, in effect, ‘'combine the rules so that the 
output of the first serves as the input of the 
second/* More specifically, the higher order 
rule can be characterized by the triple, D = 
a set of pairs of actions (more accurately, a. 
set of properties which define: equivalence 
classes of pairs of actions), O = the higher 
order action of combining pairs of lower 
order actions, and R = the corresponding 
set of composite actions. The denotation of 
■ such a rule, then, can be represented: 
{( R lf R 2 ), R | R z and R 2 are (equivalence 
classes of) rules, and R is the rule formed 
from R x and R 2 by composition}. 

Ackler and Scandura are presently per- 
forming a study in the University of Penn- 
sylvania laboratory which demonstrates, con- 
clusively in the author’s opinion, the be- 
havioral reality of such higher order rules 
(Scandura, 1970). Given the necessary 
constituent rules, as above, 5s, ranging in 
educational level from kindergarten to post- 
graduate work, were able to solve problems 
involving the composite rule if and only if 
they also had available the necessary higher 
order rule for combining pairs of such rules. 
Specifically, if they had already mastered 
the higher order rule, or could be experi- 
mentally trained in its use, as judged by 
their ability to use it on neutral tasks (i.e., 
neutral rule pairs) to form composite rules, 
then they were able to solve the composite 
problems; otherwise, they were not. The 



amazing thing about these results is that 
they held up with essentially every S . It 
was not a question of averaging over indi- 
viduals or tasks. 

Two earlier studies also bear on this issue. 
The first (Scandura, 1967b) has already 
been discussed in the section on reference. 
Suffice it to say here that the rule by which 
the constituent meaning rules (i.e., rules 
which assign meanings to minimal symbols) 
were combined is a higher order rule. 

In a second study, Roughead and Scan- 
dura ((1968) were able to identify a higher 
order rule, of the sort Gagne and Brown 
(1961) had alluded to earlier, for discover- 
ing other rules. Tlhis higher order rule 
can be stated, 

. . . formulas for the sum of the first n terms of a 
series (2 n ) may be written as the product of an 
expression involving n (i.e., /(«)) and n itself. 
The required expression in n can be obtained by 
constructing a three-columned table showing:: (a) 
the first few sums, 2", ( b ) the corresponding 
values of n , and (c) a column of numbers, 

= 2^/k, which when .multiplied by n yields the 
corresponding values of 2 n . Next," determine the 
expression f(n)—2 n /n by comparing the numbers 
In the columns labeled n and 2 n /n, and uncovering 
the (linear) relationship between them. The re- 
quired formula is simply 2 n = »*/(n) [Roughead 
& Scandura, 1968, p. 285]. 

This rule can also be analyzed in the same 
general way, but the analysis is not as sim- 
ple as the examples given above. The main 
ideas are sketched and the reader is referred 
as before to Scandura (in press) for mor^ 
details, (a) The inputs of the higher order 
rule are n-tuples of associations (i.e., de- 
generate rules) between particular series of 
a given form and their respective sums (e.g., 

1 +3 + 5 + 7 is mapped into 16). ( b ) 

The output rules are also associations, . this 
time between classes of series (e.g., 1 + 3 
+ 5 + • • • + (2 n — 1)) and formulas in 
n (e.g., n 2 ) by which sums of particular^ 
series of the given foi:m may be determined. 
In effect, the higher order rule maps ^-tuples 
of specific number series-sum pairs of a 
given form (e.g., 1+3— >4, 1+3 + 5— > 
9, 1 + 3 + 5 + 7— » 16, • • • ) into output 
associations (e.g., 1 + 3 + 5 + • • • + (?# 
— 1 ) — > « 2 ). 

As a final example, note that the inverse 
relation between addition (i.e., the rule) and 
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subtraction is but cane instance of a higher 
order rule by which any binary operation 
(e.g., multiplication:) can be mapped onto 
its respective inverse (e.g., division). 

In each case, l^her order rules are in 
some sense orthogcmal to the lower order 
rules on which fidiey operate. Lower order 
rules act on classes of stimuli and map them 
onto classes of responses. Higher order 
miles map classes of rules (or n - tuples 
thereof) onto oth<”r classes of rules. Of 
course, there is no reason to stop at this 
second level, and one can easily envision 
rules which act oil rules which act on rules 
. . . , and so on. 

An Operational Definition of 
What (Reile) Is Learned 

The question cs£ '“what is learned” is tied 
inextricably to the. question of transfer (e.g., 
Smedslund, 195 IJu In rule interpretations, 
the tendency has been to explain transfer in 
terms of “what (rule) is learned.** Such 
interpretations, however, have been rightly 
criticized as lacking operational definition. 
On strictly logical grounds it is effectively 
impossible to define in terms of performance 
“what (rule) is learned” in any unique 
sense. There are typically many different 
routes to the same end. For another thing, 
rules frequently have an infinite number of 
instances ; it is practically impossible in such 
cases to test for the acquisition of all but a 
relatively few. 

On the positive side of the ledger, it does 
not appear necessary to know everything 
that an 5 knows in order to predict what 
he will do in a given situation. Much ot the 
S* s knowledge becomes irrelevant once a 
goal is specified. Even the lowliest rodent 
has a large number of behavioral capabilities 
(rules). What rules may be applied de- 
pends on what the organism is trying to 
do. In almost all experimental research 
(whether it is based on neo-associationistic 
or more cognitive notions), there is at least 
the implicit recognition that goals, as well as 
the stimulus context, are crucial to experi- 
mental outcomes. When an 5 fails to do 
what is expected of him, he is branded as 
uncooperative. Specifically, knowing an S^s 
goal in any given stimulus situation is tanta- 



mount to specifying a class of rule-governed 
behaviors, that is, a class of behaviors which- 
can be generated by a rule. (There imay be 
more chan ges such rule for any given 
class.) Thus, -for example, knowing that an 
S' is trying to mM (a given pair of numbers) 
defines the (rale-governed) class of all pairs 
consisting of (nairs of) numbers and their 
sums, denoted {[(?«, u), (art-M*)] ] l 
are numbers}. This class effectively parti- 
tions the set- of rules an S has learned into 
two mutually exclusive subsets, one includ- 
ing those rules which can be used for adding 
pairs of numbers and the other including 
those rules vhnch cannot be so used. 

Equally important, an increasing amount 
of evidence (Levine,- 1966; Levine, Leiten- 
berg, & Risftter, 1964; Scandura, 1966, 
1967a, 1969ad) suggests that the relevant 
knowledge which underlies mathematical and 
other meaningful behavior can often be speci- 
fied with adarr degree of precision. 

These observations place important re- 
strictions cm the form a truly adequate 
operational definition of “what (rule) is 
1 earned** might take. First, it is essentially 
impossible to define “what rule is learned” 
in any unique sense. Second, an operational 
definition of what is learned must be formu- 
lated relative to a given class of rule-gov- 
erned behaviors. Third, any such definition 
must be based on performance on a small, 
finite number of instances, and, if possible, 
should be applicable no matter how many 
test instances are employed. 

In view of these restrictions, any attempt 
to define operationally what particular rule 
is learned seems a priori doomed to failure. 
What appears to be needed is a definition 
which takes into account all feasible under- 
lying rules.* Such a definition can be given 
by specifying what is learned up to a class 
of rules. Thus, given a class of rule-gov- 
erned behaviors and that a particular stimu- ' 
lus in that class elicits the corresponding 
response, “what is learned” can be defined 
as that class of rules whose denotations all 
include the given S-R pair. This definition 
may be interpreted to mean that at least one 
of the rules in the class has been used in 
responding to the test item. 

The problem remains of adapting the 
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definition to include any number of test 
instances. Fortunately, this can be accom- 
plished directly. Given a particular rule- 
governed class, n test instances, and a per- 
formance capability summarized by success 
on m of the -n test instances (m < n) and 
failure on n — m of these test instances (fand 
assuming that no. learning takes place during 
testing), then “what (rule) is learned” is. 
defined as that class of rules which provides 
an adequate account of the test data. In 
particular, a rule is included in the class is 
and only if its denotation (i.e., set of S-R 
instances) includes all of the test instances 
on which success is obtained, but none of 
those involving failure. That is, the charac- 
terization of “what is learned” includes all 
of the rules which might possibly account 
for the fact that 5* succeeded on some of the 
items but not others. 

The definition says nothing, however, 
about which rules 5* may have used to gen- 
erate his failures. It is also worth noting 
that if a given rule is in the class “what is 
learned,” and is equivalent in generating 
power to some finite connected automaton, 
then there is a way of determining whether 
or not the 5* can actually use that particular 
rule (i.e., whether or not the rule is really 
learned). This can be seen at once by re- 
calling that any such rule can be represented 
in terms of a finite set of associations. While 
the total number may be large, it is possible 
in principle, at least, to test for the .acquisi- 
tion of each and every constituent associa- 
tion. 0 . *• v 

To see how this definition applies, con- 
sider the (rule- governed) class consisting of 
the arithmetic number series and their re- 
spective sums. Let us first suppose that an 
5* has demonstrated his ability to find the 
sum (2,500) of the arithmetic series 1+3 
+ . . ■ + 99. The definition tells us that 
the class “what is learned” includes all and 

0 In practice, it is usually not necessary to go 
to this extreme. The only essential^ thing is that 
the rule in question be represented in terms of a 
(finite) set of operating and decision rules, each 
of which has a finite domain (cf. Scandura, in 
press). Although this point is implicit in what 
has been said, it is perhaps not obvious, and I 
would like to thank Gerald Goldin for raising the 
question. 



only those rules whneh provide am adequate 
account of this behavior. In this case, the 
class would include, among possibly other 
rules, each of the following : Sequential addi- 
tion (applied to arithmetic number series) ; 
the general rule: for summing arithmetic 

series, denoted (f ^ ; the rule N 2 , 

which applies to r Ja arithmetic series of the 
form 1 + 3 + — -Hr (2 A7 - 1) ; the direct 
“association” betw. r eem the series, 1 + 3 + 
• • • + 99, and its .Siam, 2,500. Thus, “what 
is learned” might the denoted by the class, 

{direct association, JVP, 




:5eimential addition, • • •}. 

As more test iniiomnation is obtained about 
an .J’s performance capability, it will be pos- 
sible generally to einminate rules from this 
class. Suppose, iior example, that an 5* is 
successful in determining the sum not only 
of the original test series, but also (say) of 
the series, 1 + 3 + • • • + 47. Then the 
size of the class “what is learned” is reduced 
accordingly to 

jiV 2 , ^ sequential 

addition, • • • |. 

According to the definition, the direct asso- 
ciation would no longer be allowed, since it 
does not apply to the second series. If the 
5* is successful on still another test instance, 
say, on the series 2 + 4 + * • * + 100, then 
the class “what is learned” is further re- 
duced to the set 

{ ( ^ — )N, sequential addition, • j. 

The rule N 2 ' is eliminated since it is not 
applicable to the third test series (i.e., 2 + 
4 4- . • • + 100). Suppose, on the other 
hand, that the 5* is successful on the first two 
test stimuli (!.«., 1 + 3+ * • + 99> and 

1 + 3 + • • • + 47), but not the third (i.e., 

2 + 4+ v * +100). Then, according to 



o 
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the definition, not only would the direct asso- 
ciation be eliminated as a feasible rule, but 

so would the more general rules 

and sequential addition. In effect, the class 
"what is Ii earned” would include only N", 
together with possible other unidentified 
rules which also provide an adequate account 
of the behavior. 

This .definition provides a basis for deter- 
mining the behavior potential (i.e., the class 
of "behaviors that an 5 is actually capable of) 
of individual 5s relative to given rule-gov- 
erned classes. To see this, we first note that 
the rules in the defined class "What is 
learned” can frequently be used to generate 
behaviors in the given rule-governed class, 
other than the initial test instances. Know- 
ing what rules are learned (i.e., in the de- 
fined class), then, might well be used as a 
basis for making predictions about perform- 
ance on other instances in the rule-governed 
class of behaviors. To make such predic- 
tions, the only theoretical assumption^ about 
performance which seems necessary is that 
if an S has one or more rules available, 
which apply in a given test situation, then he 
will use at least one of them. As trivial as 
this assumption may seem, it is an assump- 
tion. There is no guarantee that just be- 
cause an S' wants to achieve a particular goal, 
and he knows one or more rules which apply, 
that he will necessarily use one of them. 
Furthermore, it is an assumption which may 
well prove to be fundamental to any formal, 
predictive theory based on the rule construct 
(cf. Scandura, in press). 10 

The really basic question, of course, is 
whether or not the actual behavior potential 
of particular 5s is compatible with this view. 
Fortunately, Scandura and his associates 
have collected a fairly substantial body of 

10 1 originally felt that a stronger assumption of 
this sort was needed— in particular, that 5 will con- 
tinue using the same rule as. long as his goal 
remains unchanged and feedback otherwise- indi- 
cates that he is responding in an appropriate man- 
ner (Scandura, 1969b). While this Einstellung- 
type assumption may still have some merit, It is 
not a neces-ary requisite for making predictions 
about behavior potential. 




data over the past few years which suggests 
that this is the case (Roughead & Scandura, 
1968; Scandura, 1966, 1967b, 1969a; Scan- 
dura & Durnin, 1968; Scanduira et al., 
1967). Whenever the response given by an 
5“ to one unfamiliar test stimulus was in 
accord with a particular class of rules, so 
was the response to a second test stimulus 
which was of the same “general type' 7 as the 
first. It was generally possible to predict 
second test behavior with anywhere between 
80% and 95% accuracy. It is encouraging 
that other investigators have also found this 
sort of assessment procedure useful. Levine 
et al. (1964), for example, have used per- 
formance on non reinforced trials to predict 
performance on reinforced trials with a high 
degree of success. 

Furthermore, the results of the Scandura 
and Durnin (196S) study suggest that 
actual behavior potential can often be deter- 
mined in a systematic manner. It was found 
that successful performance with two stimuli, 
which differed along one or more dimen- 
sions, implied successful performance with 
new stimuli which differed only along these 
dimensions. In particular, success on two 
instances in a rule-governed class, which 
differ simultaneously along all possible di- 
mensions, implied success on any other test 
instance in the rule-governed class. 

This whole approach undoubtedly over- 
simplifies what is an extremely complex 
problem, but all things considered, it does 
seem to provide a reasonably adequate first 
approximation. The ultimate objective, of 
course, will be to devise a systematic proce- 
dure for determining behavior potential on 
any class of tasks by using a finite testing 
procedure of some sort. In fact, substantial 
progress has recently been made in this 
direction (Scandura, 1970; in press; Scan- 
dura & Durnin, 1970). 

Summary and Needed Research 

A precise formulation of the notion of a 
rule in terms. of sets and functions was pro- 
posed. It was argued that this molar for- 
mulation cannot be captured by networks of 
associations unless one allows associations 
to act on (other) associations. This tormu- 
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lation was then used as a basis for showing 
how rules are involved in decoding and en- 
coding, symbol and icon reference, and 
higher order relationships. Decoding and 
encoding were shown to involve insertion 
into and extraction from classes, respectively. 
Reference was viewed in terms of rules 
which map equivalence classes of signs into 
the classes of entities denoted by these signs. 
Symbols were shown to involve arbitrary 
reference, whereas icons retain properties in 
common with the entities they denote. 
Higher order relationships were then ex- 
pressed as higher order rules on rules. This 
was a direct generalization of associations on 
associations. Finally, a partial solution was 
posed to the vexing problem of “what 
(rule) is learned.” Given a rule-governed 
class of behaviors, “wliat is learned” was 
defined as the class of rules which provides 
an accurate account of test data. Empirical 
evidence was presented for a simple per- 
formance hypothesis based on this definition. 

There are three major directions in which 
future research might proceed. First, the 
rule formulation (SFL) itself undoubtedly 
can be further improved. While I feel 
reasonably confident that the basic ideas pre- 
sented in this paper would hold up under 
further analysis, additional detail must be 
added— but only as much as is absolutely 
necessary to deal with beliaviorally relevant 
aspects of the rule construct. (There should 
be emphasis on this point to dissuade com- 
puter enthusiasts from adopting the language 
of computer science wholesale (e.g., autom- 
ata theory) without careful consideration 
of which aspects are important in human 
behavior and which are not.) Work in this 
direction is currently underway and will be 
reported in Scandura (in press). 

Second, the SFL might profitably be used 
as an analytical tool to help clarify what is 
involved in many kinds of structured learn- 
ing and performance. Most of the SFL- 
based research conducted to date (Roughead 
& Scandura, 1968; Scandura, 1966, 1967a, 
1967b, 1969a; Scandura et al., 1967) has 
concentrated on an analysis of what is being 
presented, the nature of the required out- 
. puts, what is being learned, and the inter- 
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relationships between them. 11 While such 
analyses can, at least to some extent, be 
undertaken without the use of the SFL, or 
for that matter any other scientific language, 
the SFL seems to provide a useful frame- 
work for putting things into perspective and 
for helping to clarify difficult points. In the 
author’s research a number of questions have 
been asked on mathematics learning which 
seem not to have been asked previously in 
any serious way. For example, Roughead 
and Scandura (196S) found that what is 
learned in mathematical discovery can some- 
times be identified and presented by exposi- 
tion with equivalent results. Similarly, 
Scandura and Durnin (196S) were led, on 
the basis of an earlier finding (Scandura et 
ah, 1967), to the question of what in the 
statement of a mathematical rule leads to 
extrascope transfer. 

The SFL needs to be applied more sys- 
tematically in studies involving subject mat- 
ters other than mathematics and, in particu- 
lar, we need to determine where the SFL 
might profitably be used to formulate re- 
search and where not. There is reason to 
believe that the SFL may be applicable only 
to the extent that the classes of overt 
stimuM and responses involved can be viewed* 
as discrete (i.e., nonoverlapping) and ex- 

n I am of the opinion that insofar as structural 
learning is concerned, it may be possible, in fact, 
desirable, to first concentrate on understanding 
what kinds of behaviors might be involved and to 
give a distinctly subordinate role to such things as 
latency and exposure time. Precious little is 
known about what an 5* might be able to do when 
placed in a mathematical situation without compli- 
cating the matter further by trying to predict how 
rapidly he can do it or to determine the precise 
exposure time needed to bring the behavior about. 
In effect, what I am proposing is that ecological 
thinking needs to be brought more directly into 
theory construction in psychology. 

This. general type of approach has proved use- 
ful in other sciences. In the early development of 
chemistry, for example, it was of considerable 
interest to know what kinds of compounds one 
might expect to get by mixing various combinations 
of elements. Questions as to the precise values of 
the boundary conditions of temperature, pressure, 
and the like needed for such ’.eactions to Sake 
place were something which could reasonably be 
postponed. The first step in theory construction 
in structural learning might well follow this path 
(see Scandura, 1970). 
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haustive entities. While these requirements 
are met throughout much of mathematics and 
other structured knowledge, this may not be 
the case in such areas as social studies, 
poetry, and even language, where synonymy 
does not necessarily imply equivalence. It is 
hoped that other investigators will apply the 
SFL to a wider range of tasks and thereby 
help to clarify further its relative strengths 
and weaknesses. 

Third, theoretical assumptions need to be 
made and their implications need to be 
drawn out. Although this paper was con- 
cerned primarily with describing a new 
scientific language, it was not possible to 
completely avoid reference to theoretical as- 
sumptions. Thus, the proposed operational 
definition of “what is learlled ,, would be 
behaviorally meaningless without the appli- 
cation assumption. Fortunately, there is 
considerable empirical support for the idea. 
While this assumption is clearly riot suffi- 
cient for a theory of structural learning, it 
might nonetheless come to play a central 
role. Whatever form additional theoretical 
assumptions might take, it seems almost 
certain that they would be more compatible 
with cognitive (rule-based) notions than 
with those based on neo-associationism. 
Nonetheless, any complete theory of struc- 
tural learning will undoubtedly require ref- 
erence to such things as the limited capacity 
of human 5s to process information (Miller, 
1956). Without recourse to some such 
physiological capacity, I can see no way in 
which to explain memory or other * aspects 
of information processing. (For elaboration, 
see Scandura, in press.) 
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